Storage Consideration
When we talk about storage, we mean how data is physically stored on disk (or SSD, memory). This includes:
- How tables and indexes are organized on disk.
- How much space they use.
- How efficiently data is read/written during queries.
Key Storage Considerations
Data Types and Storage Size
- Choosing the right data type affects storage.
- Example:
- INT (4 bytes) vs BIGINT (8 bytes).
- If you only need values up to 1M, INT is enough.
- Storing unnecessarily large types wastes space and slows I/O.
Row Storage Format
- Fixed-length storage:
- Faster to access (rows are predictable size).
- But wastes space if values are small.
- Example: CHAR(50) always takes 50 bytes.
- Variable-length storage:
- Saves space (only stores needed bytes).
- Slightly slower (extra lookup).
- Example: VARCHAR(50) only stores actual characters + length info.
Normalization vs Denormalization
- Normalized tables: reduce redundancy, save storage space, but may require joins (extra reads).
- Denormalized tables: may increase storage (duplication), but speed up queries.
- Trade-off depends on workload (OLTP vs OLAP).
Index Storage
- Every index takes extra disk space.
- Example:
- A table of 10M rows, indexed on 3 columns, might double storage needs.
- Indexes speed up queries but slow down inserts/updates.
Null and Sparse Data
- Columns with many NULL values should be carefully stored.
- Some DBMS support sparse columns (store only non-null values).
- Example: Employee table with 100 optional attributes → better to split into separate tables.
LOB (Large Objects) Storage
- Images, PDFs, audio, video = BLOBs (Binary Large Objects).
- Text documents = CLOBs (Character Large Objects).
- Options for storage:
- Inside the database (slower queries, but consistent backups).
- Outside the database (filesystem or cloud), with just a reference in DB (faster, smaller DB).
Partitioning & Tablespaces
- Partitioning (split data into smaller parts) improves manageability + query speed.
- Tablespaces allow storing different tables/indexes on different physical disks → balance I/O load.
Compression
- Many DBMS support table and index compression.
- Reduces disk usage and I/O, but may increase CPU load.
- Example:
- Compressing a log table with millions of rows that are rarely updated.
Caching & Buffer Pools
- Databases use memory caches to reduce disk reads.
- Storage design should optimize which data fits in cache (e.g., hot vs cold data).